13. Details of Johansen Test (optional)

Details of the Johansen test (very optional)

So you may be wondering how we get these coefficients, and also how we check whether three stocks have a cointegrated relationship. For a closer look, let’s introduce a bit of math. Recall from the lesson on time series, that a vector autoregression attempts to describe a stock’s current value based on not only its prior values, but also the prior values of other stocks. Let’s use two stocks as an example: Note, I’m using the variable names “IBM” and “GE” to refer to the price series of these stocks. The \mu refers to a historical average for each stock’s time series. The “e” refers to an error term for each stock.

IBM_{t} = \mu_{IBM} + \beta_{1,1} \times IBM_{t-1} + \beta_{1,2} \times GE_{t-1} + e_{1,t}

GE_{t} = \mu_{GE} + \beta_{2,1} \times IBM_{t-1} + \beta_{2,2} \times GE_{t-1} + e_{2,t}

We normally use matrices to make this easier to work with, so the equations above can be written as:
\begin{bmatrix}IBM_t\\ GE_t\end{bmatrix}=\begin{bmatrix}\mu_{IBM} \\\mu_{GE}\end{bmatrix}+\begin{bmatrix}\beta_{1,1} & \beta_{1,2}\\ \beta_{2,1} & \beta_{2,2} \end{bmatrix}\begin{bmatrix}IBM_{t-1}\\ GE_{t-1}\end{bmatrix}+ \begin{bmatrix}e_{1,t}\\ e_{2,t}\end{bmatrix}

To make things simpler to write, we’ll write the 2 x 2 matrix of betas with a capital B, and we’ll denote the vector of the two stocks with a lowercase x. We’ll write the vector of \mu ’s with a single \mu , and so on. So the vector autoregression with a lag of one is:

x_{t} = \mu + B x_{t-1} + e_{t}

For a lag of p, this formula looks like
x_{t} = \mu + B_1 x_{t-1} + … + B_p x_{t-p}+ e_{t}

Now, if you recall from studying cointegrated time series, taking the time-wise difference may help us create a stationary series. So we’ll denote the timewise difference as:
As
\Delta x_{t} = x_{t} - x_{t-1}

Next, we can define x_t in using a Vector Error Correction Model (VECM) like this:
\Delta x_{t} = \mu + Bx_{t-1} + C_1\Delta x_{t-1} + C_p \Delta x_{t-p} + e_t

Notice how the B x_{t-1} term is just the vector of the previous periods’ values, and not the time-wise difference like the other terms. All of the subsequent terms to its right are time-wise differences. The Johansen test checks how many rows in the matrix B are needed to form a cointegrated series. To do this, it does some math, using an eigenvalue decomposition (let’s not worry about it for now), to determine how likely the matrix B has a rank of 0, or 1, 2, or 3, up to the number of stocks that we’re looking at (most likely 2 or 3).

Here’s a quick refresher on what the rank of a matrix is.

If you have three equations like this:

1x + 1y = 1

2x + 2y = 2

3x + 3y = 3

What do you notice about all of these equations? It looks like you only need one of the equations to describe all three of them. In this case, we’d say that the rank is 1.

Similarly, when the Johansen test checks whether the rank of matrix B is 0, 1, 2 or 3, let’s see what this means for us practically. If we were trying to see if 3 stocks were cointegrated, and the Johansen test estimated that the rank of matrix B was 3, then we’d assume that all three stocks form a cointegrated relationship. If, on the other hand, the Johansen test results showed that the rank of matrix B was likely 2, then only 2 of the 3 stocks are necessary to form a cointegrated relationship. So we’d want to try out all the pairs of stocks to see which two are cointegrated. If the rank was zero, then that means there was no cointegration among the stocks that we looked at.

To determine the rank, the Johansen test actually does a hypothesis test on whether the rank is 0, 1, 2 or 3, up to the number of stocks there are in the test (probably 2 or 3). Looking at the t-statistic or p-value can let you decide with a certain level of confidence if at least two or even three of these stocks form a cointegrated series.

Okay, we’re almost there! The Johansen test gives us a vector that we can use as the weights we assign to each stock. If you are curious, this is the largest eigenvector that’s generated by the eigenvalue decomposition. But again, let’s not worry about how to do eigenvalue decomposition, and just see how to use this vector of weights. These are the weights that we mentioned earlier when computing the linear combination of the stock prices, which is used in the same way as the spread.

So if we get w_1, w_2, w_3 from the eigenvector w, we use these as weights on each stock, as we saw earlier:

w_1 \times stock_1 + w_2 \times stock_2 + w_3 \times stock_3 = spread

To summarize, the Johansen test figures out whether a group of stocks is cointegrated, and if so, how to calculate a “spread” that we’ll keep track of for temporary deviations from its historical average. It also gives us the proportion of shares to trade for each stock.